Character representation
نویسنده
چکیده
Computer character sets are not adequate for encoding texts any language, even in modern English. In all likelihood this will remain the case for some time to come, though a major improvement in the situation is approaching with the implementation of the new ISO 10646 universal coded character set. Because of the present limitations, the standards committee which drew up SGML included one type of general entity for this purpose and included entity sets for a number of fields — for European languages, mathematics, and publishing — in an appendix of the SGML standard. This work is being expanded by the ISO committee and TEI.1 This article addresses the following topics: 7-bit coded character sets, 8-bit coded character sets, the Universal coded Character Set (UCS), SGML and TEI entity sets, the SGML declaration, the TEI writing system declaration, coded characters and glyphs, standards work, and a conclusion.2 The focus in this article will be upon international and national standards on this subject. Therefore, we are discussing primarily the work of the International The Chair of our technical committee submitted a draft of this article. Comments were received from David Birnbaum, Bert Bos, Steve DeRose, Berend Dijk, and Michael Sperberg-McQueen for which we thank them. The committee made revisions before submitting the final version to the editors of this issue of CHUM. In recognition of longstanding contributions to our work, the committee has elected the above mentioned contributors as honourable members, which increases our numbers five-fold. We also wish to thank the secretary general of ISO, L.D. Eicher, for permission to publish portions of the ISO standards and Jan van den Beld, secretary general of ECMA, for furnishing a complete copy of its official register of all known character sets and current versions of the ECMA standards. We also thank Edwin Smura, registrar of AFII, for supplying a copy of much of their font registry. yVakgroep Alfa Informatica, Rijksuniversiteit Groningen, POB 716, Groningen 9700 AS, The Netherlands. e-mail: [email protected] 1Some languages, such as Chinese and Japanese, require a coded character set of more than 128 or 256 characters. There are national character standards for these languages and unfortunately a number of encoding methods are in use. These languages are excluded from the discussion here. It is a separate subject and should have an article devoted to it. 2Extracts from ISO 646:1991 and ISO 8859 parts 1 to 10 have been reproduced with the permission of the International Organization for Standardization, ISO. The complete standards can be obtained from your national standards organisation or from the ISO Central Secretariat, Case Postal 56, CH-1211 Geneva 20, Switzerland. Copyright remains with ISO. European Computer Manufacturers Association (ECMA) standards are available free from ECMA, 114 Rue du Rhône, CH-1204 Geneva.
منابع مشابه
Representation of the Personality and Character of the Kurds by Orientalists: A Study on Rich’s ‘Narrative of a Residence in Kurdistan’
One of the important ways of studying of the personality and character of ethnics, nations and cultures is mainly accomplished via the opinions others formed about them. Since the onset of modernity, the West has been always a major other that explored every corner of the world. Along with colonial domination, the West has always tried to study and fathom other cultures in order to establish it...
متن کاملHandwritten Character Recognition using Modified Gradient Descent Technique of Neural Networks and Representation of Conjugate Descent for Training Patterns
The purpose of this study is to analyze the performance of Back propagation algorithm with changing training patterns and the second momentum term in feed forward neural networks. This analysis is conducted on 250 different words of three small letters from the English alphabet. These words are presented to two vertical segmentation programs which are designed in MATLAB and based on portions (1...
متن کاملGroups with Two Extreme Character Degrees and their Minimal Faithful Representations
for a finite group G, we denote by p(G) the minimal degree of faithful permutation representations of G, and denote by c(G), the minimal degree of faithful representation of G by quasi-permutation matrices over the complex field C. In this paper we will assume that, G is a p-group of exponent p and class 2, where p is prime and cd(G) = {1, |G : Z(G)|^1/2}. Then we will s...
متن کاملSome bounds on unitary duals of classical groups - non-archimeden case
We first give bounds for domains where the unitarizabile subquotients can show up in the parabolically induced representations of classical $p$-adic groups. Roughly, they can show up only if the central character of the inducing irreducible cuspidal representation is dominated by the square root of the modular character of the minimal parabolic subgroup. For unitarizable subquotients...
متن کاملQUASI-PERMUTATION REPRESENTATIONS OF SUZtTKI GROUP
By a quasi-permutation matrix we mean a square matrix over the complex field C with non-negative integral trace. Thus every permutation matrix over C is a quasipermutation matrix. For a given finite group G, let p(G) denote the minimal degree of a faithful permutation representation of G (or of a faithful representation of G by permutation matrices), let q(G) denote the minimal degree of a fai...
متن کاملRepresentation of Latinos in Hollywood: Masculinity in Iñárritu’s Films
This paper studies the image of Latinos in the United States of Americathrough the Hollywood films production by the well-known Mexican director,AlejandroGonz?lez I??rritu. Using content analysis of the Latinos characters in the threefilms directed by him and in collaboration with screenwriter Guillermo Arriaga, thepresent paper examines the masculinity frame to see whether the Latinos areportr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Computers and the Humanities
دوره 29 شماره
صفحات -
تاریخ انتشار 1995